The FCC (Fluidized Catalytic Cracking) condenser is a critical component in refinery operations. It process sweet Vacuum Gas Oil as a feed and cracks heavy oil into lighter oils like LPG and Gasoline.
A reduction in air flow rate to the FCC regenerator lowers its temperature, which can result in incomplete combustion ("behind the burn"), reduced catalyst regeneration efficiency, and a decline in plant throughput. To ensure safety, this situation may trigger an Emergency Shutdown (ESD). This project utilizes Machine Learning (ML) techniques to detect anomalies in FCC condenser efficiency.
The dataset contains operational parameters, including temperature, pressure, flow rates, and velocities. The following preprocessing steps were performed:
EDA was conducted to visualize trends and detect potential anomalies. Key findings:
1. Principal Component Analysis (PCA)
PCA reduced high-dimensional data to two principal components.
Anomalies were detected by analyzing data points far from the normal cluster.
2. Autoencoder (Deep Learning)
A neural network trained to reconstruct normal data patterns.
Reconstruction error was used to flag high-error data points as anomalies.
Importing libraries and datasets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
plt.rcParams['figure.figsize'] = [12, 6]
plt.rcParams.update({'font.size': 12})
df = pd.read_excel('columns.xlsx')
col = df['Symbol'].values
col = col.tolist()
df_stableFeedFlow = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_stableFeedFlow_outputs.csv',header=None)
df_stableFeedFlow.set_index = df_stableFeedFlow.iloc[:,0]
df_stableFeedFlow = df_stableFeedFlow.drop(columns=0)
df_stableFeedFlow.columns= col
df_stableFeedFlow.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2510 | 164.81 | 78.793 | 460.44 | 34.4 | -6.4 | 28 | 2.6791 | 1565.6 | 615.99 | 969.03 | ... | 1641.5 | 214.26 | 2972.2 | 313.64 | 509.70 | 627.94 | 47.332 | 49.465 | 49.062 | 47.113 |
| 2250 | 165.12 | 79.937 | 461.05 | 34.4 | -6.4 | 28 | 2.6831 | 1563.9 | 616.00 | 969.01 | ... | 1648.5 | 210.68 | 3010.2 | 314.99 | 509.90 | 628.21 | 45.753 | 49.871 | 49.654 | 47.494 |
| 1550 | 165.11 | 76.402 | 461.20 | 34.4 | -6.4 | 28 | 2.6796 | 1562.9 | 616.00 | 968.95 | ... | 1644.8 | 212.73 | 2952.6 | 311.70 | 509.74 | 628.01 | 49.726 | 49.476 | 49.609 | 47.294 |
| 1035 | 165.12 | 79.052 | 460.65 | 34.4 | -6.4 | 28 | 2.6847 | 1566.3 | 615.99 | 969.04 | ... | 1647.3 | 207.21 | 2989.2 | 314.01 | 509.85 | 628.00 | 46.855 | 49.687 | 48.861 | 47.431 |
| 2047 | 164.86 | 79.886 | 461.45 | 34.4 | -6.4 | 28 | 2.6766 | 1559.8 | 616.00 | 968.97 | ... | 1633.3 | 220.75 | 2972.4 | 314.41 | 509.57 | 627.92 | 46.378 | 49.340 | 49.324 | 46.653 |
5 rows × 46 columns
df_varyingFeedFlow=pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_varyingFeedFlow_outputs.csv',header=None)
df_varyingFeedFlow.set_index = df_varyingFeedFlow.iloc[:,0]
df_varyingFeedFlow = df_varyingFeedFlow.drop(columns=0)
df_varyingFeedFlow.columns= col
df_varyingFeedFlow.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5059 | 162.09 | 80.015 | 460.63 | 34.4 | -6.4 | 28 | 2.6343 | 1547.7 | 616.00 | 969.02 | ... | 1592.0 | 231.02 | 2575.4 | 307.04 | 505.86 | 622.37 | 53.798 | 43.872 | 47.175 | 44.676 |
| 2478 | 164.47 | 79.050 | 461.41 | 34.4 | -6.4 | 28 | 2.6695 | 1557.7 | 616.00 | 968.95 | ... | 1630.7 | 222.70 | 2904.2 | 312.66 | 509.04 | 627.10 | 48.261 | 48.494 | 49.154 | 46.557 |
| 9389 | 165.23 | 80.033 | 460.58 | 34.4 | -6.4 | 28 | 2.6840 | 1567.5 | 616.01 | 969.00 | ... | 1643.5 | 211.99 | 3015.5 | 315.17 | 509.97 | 628.21 | 45.564 | 49.931 | 48.979 | 47.204 |
| 1497 | 166.99 | 75.731 | 461.42 | 34.4 | -6.4 | 28 | 2.7136 | 1573.2 | 616.00 | 969.01 | ... | 1675.3 | 211.15 | 3280.2 | 317.02 | 512.42 | 631.67 | 44.666 | 53.997 | 51.098 | 48.790 |
| 82 | 165.05 | 76.114 | 460.42 | 34.4 | -6.4 | 28 | 2.6792 | 1567.3 | 615.99 | 968.97 | ... | 1638.2 | 215.91 | 2943.7 | 311.38 | 509.69 | 628.02 | 50.118 | 49.383 | 49.131 | 46.919 |
5 rows × 46 columns
df_CAB_valveLeak = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/CAB_valveLeak_outputs.csv',header=None)
df_CAB_valveLeak.set_index = df_CAB_valveLeak.iloc[:,0]
df_CAB_valveLeak = df_CAB_valveLeak.drop(columns=0)
df_CAB_valveLeak.columns= col
df_CAB_valveLeak.sample(5)
| F3 | Tatm | T1 | P4 | deltaP | P6 | Fair | T3 | T2 | Tr | ... | FLCO | FSlurry | FReflux | Tfra | T10 | T20 | V9 | V8 | V10 | V11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 197 | 165.18 | 77.406 | 461.11 | 34.4 | -6.4 | 28.0 | 2.6816 | 1563.8 | 615.99 | 968.98 | ... | 1639.0 | 211.98 | 2960.0 | 312.47 | 509.75 | 627.97 | 48.717 | 49.468 | 48.745 | 46.961 |
| 806 | 164.92 | 79.932 | 460.89 | 34.4 | -6.4 | 28.0 | 2.6809 | 1563.7 | 616.00 | 969.02 | ... | 1645.1 | 215.73 | 3011.8 | 315.08 | 509.89 | 628.31 | 45.701 | 49.881 | 49.905 | 47.298 |
| 902 | 164.92 | 79.636 | 461.34 | 34.4 | -6.4 | 28.0 | 2.6813 | 1560.9 | 616.00 | 969.04 | ... | 1644.2 | 213.84 | 2999.0 | 314.65 | 509.84 | 628.15 | 46.168 | 49.741 | 49.456 | 47.251 |
| 1215 | 165.20 | 77.640 | 460.40 | 34.4 | -6.4 | 28.0 | 2.6815 | 1568.3 | 615.99 | 968.96 | ... | 1648.9 | 212.58 | 2992.6 | 313.26 | 509.96 | 628.34 | 47.964 | 49.871 | 49.755 | 47.506 |
| 31 | 165.02 | 75.406 | 461.02 | 34.4 | -6.4 | 28.0 | 2.6804 | 1563.5 | 615.99 | 968.99 | ... | 1640.3 | 214.24 | 2945.1 | 310.94 | 509.80 | 628.09 | 50.727 | 49.484 | 48.764 | 47.031 |
5 rows × 46 columns
EDA
df_CAB_valveLeak.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1440 entries, 0 to 1439 Data columns (total 46 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 F3 1440 non-null float64 1 Tatm 1440 non-null float64 2 T1 1440 non-null float64 3 P4 1440 non-null float64 4 deltaP 1440 non-null float64 5 P6 1440 non-null float64 6 Fair 1440 non-null float64 7 T3 1440 non-null float64 8 T2 1440 non-null float64 9 Tr 1440 non-null float64 10 Treg 1440 non-null float64 11 Lsp 1440 non-null float64 12 Tcyc 1440 non-null float64 13 Tcyc - Treg 1440 non-null float64 14 Cco,sg 1440 non-null int64 15 Co2,sg 1440 non-null float64 16 P5 1440 non-null float64 17 V4 1440 non-null float64 18 V6 1440 non-null float64 19 V7 1440 non-null float64 20 V3 1440 non-null float64 21 V1 1440 non-null float64 22 V2 1440 non-null float64 23 Frgc 1440 non-null int64 24 Fsc 1440 non-null int64 25 ACAB 1440 non-null float64 26 AWGC 1440 non-null float64 27 F5 1440 non-null float64 28 F7 1440 non-null float64 29 Fsg 1440 non-null float64 30 FV11 1440 non-null int64 31 P1 1440 non-null float64 32 P2 1440 non-null float64 33 FLPG 1440 non-null float64 34 FLN 1440 non-null float64 35 FHN 1440 non-null float64 36 FLCO 1440 non-null float64 37 FSlurry 1440 non-null float64 38 FReflux 1440 non-null float64 39 Tfra 1440 non-null float64 40 T10 1440 non-null float64 41 T20 1440 non-null float64 42 V9 1440 non-null float64 43 V8 1440 non-null float64 44 V10 1440 non-null float64 45 V11 1440 non-null float64 dtypes: float64(42), int64(4) memory usage: 517.6 KB
df_CAB_valveLeak.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| F3 | 1440.0 | 165.012694 | 1.582447e-01 | 164.530000 | 164.890000 | 165.020000 | 165.140000 | 165.480000 |
| Tatm | 1440.0 | 78.342193 | 1.491834e+00 | 74.981000 | 77.191500 | 78.746000 | 79.687250 | 80.070000 |
| T1 | 1440.0 | 460.899750 | 3.939347e-01 | 459.580000 | 460.650000 | 460.910000 | 461.140000 | 462.050000 |
| P4 | 1440.0 | 34.400000 | 8.884870e-13 | 34.400000 | 34.400000 | 34.400000 | 34.400000 | 34.400000 |
| deltaP | 1440.0 | -6.400000 | 2.486461e-05 | -6.400600 | -6.400000 | -6.400000 | -6.400000 | -6.399800 |
| P6 | 1440.0 | 27.999999 | 2.635231e-05 | 27.999000 | 28.000000 | 28.000000 | 28.000000 | 28.000000 |
| Fair | 1440.0 | 2.680753 | 4.127865e-03 | 2.622600 | 2.678700 | 2.680700 | 2.683000 | 2.699300 |
| T3 | 1440.0 | 1564.149375 | 2.865350e+00 | 1555.800000 | 1562.200000 | 1564.100000 | 1566.000000 | 1573.500000 |
| T2 | 1440.0 | 616.000063 | 7.019004e-03 | 615.980000 | 616.000000 | 616.000000 | 616.000000 | 616.020000 |
| Tr | 1440.0 | 969.001563 | 6.849410e-02 | 967.440000 | 968.980000 | 969.000000 | 969.020000 | 969.370000 |
| Treg | 1440.0 | 1249.965208 | 1.430992e-01 | 1248.800000 | 1249.900000 | 1250.000000 | 1250.000000 | 1250.100000 |
| Lsp | 1440.0 | 29.693318 | 1.550688e-01 | 27.519000 | 29.620000 | 29.698000 | 29.778000 | 30.236000 |
| Tcyc | 1440.0 | 1255.261111 | 1.539885e-01 | 1253.800000 | 1255.200000 | 1255.300000 | 1255.300000 | 1255.700000 |
| Tcyc - Treg | 1440.0 | 5.294570 | 8.754049e-02 | 4.489500 | 5.258475 | 5.286750 | 5.318325 | 5.881300 |
| Cco,sg | 1440.0 | 29875.119444 | 1.532124e+02 | 29012.000000 | 29838.000000 | 29875.000000 | 29916.000000 | 31371.000000 |
| Co2,sg | 1440.0 | 0.012540 | 3.653610e-04 | 0.009197 | 0.012382 | 0.012508 | 0.012643 | 0.015006 |
| P5 | 1440.0 | 24.900000 | 6.752501e-13 | 24.900000 | 24.900000 | 24.900000 | 24.900000 | 24.900000 |
| V4 | 1440.0 | 46.263708 | 3.639806e-01 | 45.379000 | 46.008750 | 46.373000 | 46.559000 | 46.840000 |
| V6 | 1440.0 | 27.734301 | 1.285316e+00 | 24.580000 | 28.212000 | 28.342500 | 28.408000 | 28.813000 |
| V7 | 1440.0 | 54.595515 | 8.454750e-02 | 53.401000 | 54.554000 | 54.595000 | 54.640250 | 54.975000 |
| V3 | 1440.0 | 46.992527 | 1.936154e-02 | 46.924000 | 46.979000 | 46.991000 | 47.005000 | 47.078000 |
| V1 | 1440.0 | 57.939251 | 2.065033e-01 | 57.343000 | 57.801000 | 57.934000 | 58.073000 | 58.619000 |
| V2 | 1440.0 | 45.301624 | 8.714133e-02 | 45.013000 | 45.255000 | 45.295000 | 45.341000 | 46.261000 |
| Frgc | 1440.0 | 49594.152778 | 7.121163e+01 | 48812.000000 | 49549.750000 | 49592.000000 | 49637.000000 | 49971.000000 |
| Fsc | 1440.0 | 49593.238194 | 7.082435e+01 | 49072.000000 | 49549.000000 | 49590.000000 | 49636.250000 | 49970.000000 |
| ACAB | 1440.0 | 280.764625 | 1.370150e+00 | 275.270000 | 279.730000 | 281.165000 | 281.860000 | 282.770000 |
| AWGC | 1440.0 | 204.625465 | 2.503719e+00 | 198.600000 | 202.837500 | 205.385000 | 206.680000 | 208.490000 |
| F5 | 1440.0 | 1990.668125 | 7.094397e+00 | 1970.200000 | 1985.900000 | 1990.500000 | 1995.300000 | 2014.000000 |
| F7 | 1440.0 | 3737.294375 | 7.181960e+00 | 3636.300000 | 3733.900000 | 3737.200000 | 3740.900000 | 3769.700000 |
| Fsg | 1440.0 | 160.845076 | 2.493207e-01 | 157.320000 | 160.720000 | 160.840000 | 160.980000 | 161.960000 |
| FV11 | 1440.0 | 28116.156250 | 2.742243e+02 | 27445.000000 | 27925.000000 | 28199.000000 | 28338.000000 | 28547.000000 |
| P1 | 1440.0 | 14.637996 | 1.825900e-04 | 14.637000 | 14.638000 | 14.638000 | 14.638000 | 14.641000 |
| P2 | 1440.0 | 35.041662 | 2.336692e-02 | 34.976000 | 35.027000 | 35.037000 | 35.055000 | 35.212000 |
| FLPG | 1440.0 | 3037.774583 | 4.419007e+01 | 2931.400000 | 3006.075000 | 3050.950000 | 3073.925000 | 3106.600000 |
| FLN | 1440.0 | 3917.950139 | 4.534886e+01 | 3853.200000 | 3878.675000 | 3904.650000 | 3953.225000 | 4029.600000 |
| FHN | 1440.0 | 708.818424 | 3.439050e+00 | 697.990000 | 706.720000 | 708.665000 | 710.982500 | 719.760000 |
| FLCO | 1440.0 | 1643.658125 | 3.754342e+00 | 1632.900000 | 1640.800000 | 1643.600000 | 1646.500000 | 1652.200000 |
| FSlurry | 1440.0 | 213.913611 | 3.699680e+00 | 205.930000 | 211.107500 | 214.075000 | 216.610000 | 224.070000 |
| FReflux | 1440.0 | 2984.433264 | 1.878808e+01 | 2938.600000 | 2972.300000 | 2987.000000 | 2996.500000 | 3029.200000 |
| Tfra | 1440.0 | 313.547090 | 1.204799e+00 | 310.710000 | 312.657500 | 313.895000 | 314.560000 | 315.330000 |
| T10 | 1440.0 | 509.836076 | 1.319542e-01 | 509.500000 | 509.740000 | 509.840000 | 509.930000 | 510.150000 |
| T20 | 1440.0 | 628.161611 | 1.590902e-01 | 627.840000 | 628.040000 | 628.140000 | 628.290000 | 628.470000 |
| V9 | 1440.0 | 47.511276 | 1.506524e+00 | 45.383000 | 46.214000 | 47.051000 | 48.650250 | 51.247000 |
| V8 | 1440.0 | 49.688417 | 1.914883e-01 | 49.287000 | 49.533750 | 49.700000 | 49.841000 | 50.131000 |
| V10 | 1440.0 | 49.368097 | 4.820776e-01 | 47.879000 | 49.072000 | 49.343000 | 49.672250 | 50.901000 |
| V11 | 1440.0 | 47.220298 | 2.061357e-01 | 46.630000 | 47.065750 | 47.218000 | 47.377000 | 47.686000 |
sns.heatmap(df_CAB_valveLeak.corr(),cmap='coolwarm')
<AxesSubplot:>
for n,i in enumerate(df_stableFeedFlow.columns):
plt.figure(figsize=(12,2))
plt.plot(df_stableFeedFlow[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
for n,i in enumerate(df_varyingFeedFlow.columns):
plt.figure(figsize=(12,2))
plt.plot(df_varyingFeedFlow[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
for n,i in enumerate(df_CAB_valveLeak.columns):
plt.figure(figsize=(12,2))
plt.plot(df_CAB_valveLeak[i])
plt.xlabel('time (mins)')
plt.ylabel(i)
plt.title(df[df['Symbol']==i]['Description'].values)
plt.show()
Scaling the data with mean=0 & std = 1 using Standard Scalar.
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(df_stableFeedFlow)
Applying PCA
from sklearn.decomposition import PCA
pca = PCA()
X_pca = pca.fit_transform(X)
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,47)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,46)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()
It can be clearly seen than 10 dimentions can describe more than 98% data, hence redcing the feature space from 46 to 10.
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,11)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,10)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()
Applying Autoencoders
X_train = X.reshape(2880,46,1)
Lets create a Sequential model with Bidirectional LSTM & train the model when plant is in steady state.
To avoid overfitting of model by using 20% dropout.
# del model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256,return_sequences=True),input_shape=(46,1)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128,return_sequences=True)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional (Bidirection (None, 46, 512) 528384
al)
dropout (Dropout) (None, 46, 512) 0
bidirectional_1 (Bidirecti (None, 46, 256) 656384
onal)
dropout_1 (Dropout) (None, 46, 256) 0
dense (Dense) (None, 46, 1) 257
=================================================================
Total params: 1185025 (4.52 MB)
Trainable params: 1185025 (4.52 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
model.fit(X_train,X_train,epochs=30)
Epoch 1/30 90/90 [==============================] - 56s 493ms/step - loss: 0.1484 - mae: 0.2095 Epoch 2/30 90/90 [==============================] - 48s 538ms/step - loss: 0.0029 - mae: 0.0376 Epoch 3/30 90/90 [==============================] - 30s 333ms/step - loss: 0.0025 - mae: 0.0338 Epoch 4/30 90/90 [==============================] - 22s 245ms/step - loss: 0.0022 - mae: 0.0316 Epoch 5/30 90/90 [==============================] - 23s 252ms/step - loss: 0.0020 - mae: 0.0301 Epoch 6/30 90/90 [==============================] - 27s 305ms/step - loss: 0.0019 - mae: 0.0291 Epoch 7/30 90/90 [==============================] - 40s 441ms/step - loss: 0.0018 - mae: 0.0281 Epoch 8/30 90/90 [==============================] - 40s 440ms/step - loss: 0.0018 - mae: 0.0276 Epoch 9/30 90/90 [==============================] - 40s 444ms/step - loss: 0.0017 - mae: 0.0270 Epoch 10/30 90/90 [==============================] - 39s 438ms/step - loss: 0.0017 - mae: 0.0267 Epoch 11/30 90/90 [==============================] - 40s 444ms/step - loss: 0.0016 - mae: 0.0263 Epoch 12/30 90/90 [==============================] - 40s 445ms/step - loss: 0.0016 - mae: 0.0262 Epoch 13/30 90/90 [==============================] - 40s 443ms/step - loss: 0.0016 - mae: 0.0259 Epoch 14/30 90/90 [==============================] - 40s 441ms/step - loss: 0.0016 - mae: 0.0256 Epoch 15/30 90/90 [==============================] - 40s 441ms/step - loss: 0.0015 - mae: 0.0253 Epoch 16/30 90/90 [==============================] - 40s 442ms/step - loss: 0.0015 - mae: 0.0252 Epoch 17/30 90/90 [==============================] - 39s 439ms/step - loss: 0.0015 - mae: 0.0250 Epoch 18/30 90/90 [==============================] - 40s 440ms/step - loss: 0.0015 - mae: 0.0251 Epoch 19/30 90/90 [==============================] - 40s 439ms/step - loss: 0.0015 - mae: 0.0250 Epoch 20/30 90/90 [==============================] - 40s 440ms/step - loss: 0.0015 - mae: 0.0249 Epoch 21/30 90/90 [==============================] - 40s 445ms/step - loss: 0.0015 - mae: 0.0248 Epoch 22/30 90/90 [==============================] - 39s 436ms/step - loss: 0.0014 - mae: 0.0245 Epoch 23/30 90/90 [==============================] - 39s 439ms/step - loss: 0.0014 - mae: 0.0246 Epoch 24/30 90/90 [==============================] - 39s 437ms/step - loss: 0.0014 - mae: 0.0245 Epoch 25/30 90/90 [==============================] - 39s 436ms/step - loss: 0.0014 - mae: 0.0244 Epoch 26/30 90/90 [==============================] - 41s 456ms/step - loss: 0.0014 - mae: 0.0244 Epoch 27/30 90/90 [==============================] - 40s 443ms/step - loss: 0.0014 - mae: 0.0243 Epoch 28/30 90/90 [==============================] - 40s 445ms/step - loss: 0.0014 - mae: 0.0243 Epoch 29/30 90/90 [==============================] - 41s 457ms/step - loss: 0.0014 - mae: 0.0241 Epoch 30/30 90/90 [==============================] - 44s 487ms/step - loss: 0.0014 - mae: 0.0243
<keras.src.callbacks.History at 0x2b58c3fd790>
Calculating the Reconstruction error using MAE.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.
error_ae = []
for i in range(X.shape[0]):
y_pred = model.predict(X[i].reshape(1,46,1),verbose=None)[0,:,0]
error_ae.append(np.abs(X[i]-y_pred).sum())
AE_CL = np.percentile(error_ae,99)
pd.Series(error_ae).plot()
plt.hlines(AE_CL,0,len(error_ae),colors='red',linestyles='--')
<matplotlib.collections.LineCollection at 0x2b58ac00430>
Calculating the Reconstruction error using Q-test, T22-test & Cosine similarity.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.
X_reconstructed = np.dot(X_pca,pca.components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
Q_CL = np.percentile(Q_train,99)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
T_CL = np.percentile(T_train,99)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
cosine = []
ed = []
for i in range(X.shape[0]):
v1 = X[i]
v2 = np.dot(X_pca,pca.components_)[i]
cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
ed.append(np.linalg.norm(v1 - v2))
C_CL = np.min(cosine)
E_CL = np.percentile(ed,99)
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
Q_CL,T_CL,C_CL,E_CL,AE_CL
(4.123113215519088, 20.424350352568787, 0.9427550112367161, 0.9281694746755792, 0.4572564523844304)
Let's create a function for test data preprocessing and testing the data with our model.
def Q_test(X,X_pca,pca_components_,Q_CL):
X_reconstructed = np.dot(X_pca,pca_components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
return error_pca
def T_test(X_pca,explained_variance_,TCL):
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
def cosine(X,X_transformed,pca_components_,C_CL,E_CL):
cosine = []
ed = []
for i in range(X.shape[0]):
v1 = X[i]
v2 = np.dot(X_transformed,pca_components_)[i]
cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
ed.append(np.linalg.norm(v1 - v2))
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.xlabel('Sample #')
# plt.ylabel('Eucledian Distance metric: training data')
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
def autoencoder(df_test,CL):
X_test = ss.transform(df_test)
error_ae = []
error_sum = []
for i in range(X_test.shape[0]):
y_pred = model.predict(X_test[i].reshape(1,46,1),verbose=None)[0,:,0]
error_ae.append(np.abs(X_test[i]-y_pred))
error_sum.append(np.abs(X_test[i]-y_pred).sum())
error_ae=np.array(error_ae)
pd.Series(error_sum).plot(color = 'black')
plt.hlines(CL,0,len(error_ae),colors='red',linestyles='--')
plt.xlabel('Sample #')
plt.ylabel('Reconstruction error by Autoencoder')
return error_ae
X = ss.transform(df_varyingFeedFlow)
X_test = pca.transform(X)
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
T_test(X_test,pca.explained_variance_,T_CL)
cosine(X,X_test,pca.components_,C_CL,E_CL)
error_ae = autoencoder(df_varyingFeedFlow,AE_CL)
X = ss.transform(df_CAB_valveLeak)
X_test = pca.transform(X)
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
T_test(X_test,pca.explained_variance_,T_CL)
cosine(X,X_test,pca.components_,C_CL,E_CL)
error_ae = autoencoder(df_CAB_valveLeak,AE_CL)
Inference
During steady state operation the errors are within limit but, suddenly the error starts increasing after 150 mins.
So, let’s check which parameters are deviating maximum form steady state.
Considering top 10 variables responsible for plant deviation.
Q test Error
#%% Q contribution
error = np.abs(error_pca).sum(axis=1)
cum = []
for index,value in enumerate(error):
if (value>Q_CL) and (len(cum)<15):
previous_val = value
cum.append(value)
if len(cum) == 15:
sample = index
break
else:
cum=[]
# sample = ((pd.Series(error_pca.sum(axis=1))-pd.Series(error_pca.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 141 mins
plt.figure(figsize=[15,4])
plt.bar(['variable ' + str((i+1)) for i in range(len(Q_contri))], Q_contri)
plt.xticks(rotation = 80)
plt.ylabel('Q contributions')
plt.show()
plt.figure(figsize=(15,40))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(Q_contri)[:-11:-1]):
plt.subplot(5,2,i+1)
plt.plot(df_CAB_valveLeak.iloc[:,n],'blue', linewidth=1)
plt.xlabel('time (mins)')
plt.ylabel(df['Symbol'][n])
plt.title(df['Description'][n])
plt.show
Time- 141 mins
Autoecoder Error
#%% Autoencoder Error
error = np.abs(error_ae).sum(axis=1)
cum = []
for index,value in enumerate(error):
if (value>AE_CL) and (len(cum)<15):
previous_val = value
cum.append(value)
if len(cum) == 15:
sample = index
break
else:
cum=[]
# sample = ((pd.Series(error_ae.sum(axis=1))-pd.Series(error_ae.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 186 mins
plt.figure(figsize=(15,45))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(error_ae[sample])[:-11:-1]):
plt.subplot(5,2,i+1)
plt.plot(df_CAB_valveLeak.iloc[:,n],'blue', linewidth=1)
plt.xlabel('time (mins)')
plt.ylabel(df['Symbol'][n])
plt.title(df['Description'][n])
plt.show
Time- 186 mins
Regenerator temperature Control Valve has suddenly increased which means there an abnormal situation where catalyst gets cooling and to maintain catalyst temperature air flow is increased.
There is a possibility of behind the burn or air flow rate is decreased, since trends shows a stable catalyst flow and reactor temperature profile till 250 mins whereas air flow CV starts opening at 100 mins.